AITopics | structural pruning

Collaborating Authors

structural pruning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLM-Pruner: On the Structural Pruning of Large Language Models Xinyin Ma Gongfan Fang Xinchao Wang National University of Singapore

Neural Information Processing SystemsFeb-11-2026, 03:28:44 GMT

Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.40)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Bexar County > San Antonio (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Consumer Products & Services > Restaurants (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Structural Pruning for Diffusion Models -- Supplementary Materials -- Gongfan Fang Xinyin Ma Xinchao Wang National University of Singapore

Neural Information Processing SystemsFeb-10-2026, 02:58:37 GMT

Table 1: Finetuning pruned models with more training steps. Note that the only difference lies in the position of the summation. It is easy to observe that our model achieves convergence rapidly. The dataset size of LSUN Bedroom is 44.48GB, which is We conducted further investigations to explore the effectiveness of knowledge distillation in enhancing pruning techniques. Table 3 profiles the pre-trained and the pruned models on a single A5000, with a batch size of 1.

artificial intelligence, machine learning, pre-trained model, (12 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LLM-Pruner: On the Structural Pruning of Large Language Models

Neural Information Processing SystemsDec-24-2025, 22:27:57 GMT

Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in both the deployment, inference, and training stages. With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to preserve the multi-task solving and language generation ability of the original LLM. One challenge to achieving this is the enormous size of the training corpus of LLM, which makes both data transfer and model post-training over-burdensome. Thus, we tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-pruner, adopts structural pruning that selectively removes non-critical coupled structures based on gradient information, maximally preserving the majority of the LLM's functionality. To this end, the performance of pruned models can be efficiently recovered through tuning techniques, LoRA, in merely 3 hours, requiring only 50K data. We validate the LLM-Pruner on three LLMs, including LLaMA, Vicuna, and ChatGLM, and demonstrate that the compressed models still exhibit satisfactory capabilities in zero-shot classification and generation. The code will be made public.

llm-pruner, name change, structural pruning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Neural Information Processing SystemsDec-24-2025, 18:30:45 GMT

Unstructured pruning reduces the memory footprint in deep neural networks (DNNs). Recently, researchers proposed different types of structural pruning intending to reduce also the computation complexity. In this work, we first suggest a new measure called mask-diversity which correlates with the expected accuracy of the different types of structural pruning. We focus on the recently suggested N:M fine-grained block sparsity mask, in which for each block of M weights, we have at least N zeros. While N:M fine-grained block sparsity allows acceleration in actual modern hardware, it can be used only to accelerate the inference phase.

accelerated sparse neural training, name change, provable and efficient method, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)

Add feedback

Structural Pruning for Diffusion Models

Neural Information Processing SystemsDec-24-2025, 14:11:27 GMT

Generative modeling has recently undergone remarkable advancements, primarily propelled by the transformative implications of Diffusion Probabilistic Models (DPMs). The impressive capability of these models, however, often entails significant computational overhead during both training and inference. To tackle this challenge, we present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones, without the need for extensive re-training. The essence of Diff-Pruning is encapsulated in a Taylor expansion over pruned timesteps, a process that disregards non-contributory diffusion steps and ensembles informative gradients to identify important weights. Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method: 1) Efficiency: it enables approximately a 50\% reduction in FLOPs at a mere 10% to 20% of the original training expenditure; 2) Consistency: the pruned diffusion models inherently preserve generative behavior congruent with their pre-trained models.

diffusion model, name change, structural pruning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Structural Pruning via Latency-Saliency Knapsack

Neural Information Processing SystemsDec-24-2025, 05:31:28 GMT

Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets, on different platforms.

latency-saliency knapsack, name change, structural pruning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.60)

Add feedback

Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1

Karagulle, Ruya, Vasile, Cristian-Ioan, Ozay, Necmiye

arXiv.org Artificial IntelligenceNov-12-2025

Abstract--Autonomous systems increasingly rely on human feedback to align their behavior, expressed as pairwise comparisons, rankings, or demonstrations. While existing methods can adapt behaviors, they often fail to guarantee safety in safety-critical domains. We propose a safety-guaranteed, optimal, and efficient approach to solve the learning problem from preferences, rankings, or demonstrations using Weighted Signal T emporal Logic (WSTL). WSTL learning problems, when implemented naively, lead to multi-linear constraints in the weights to be learned. By introducing structural pruning and log-transform procedures, we reduce the problem size and recast the problem as a Mixed-Integer Linear Program while preserving safety guarantees. Experiments on robotic navigation and real-world Formula 1 data demonstrate that the method effectively captures nuanced preferences and models complex task objectives. Autonomous systems are increasingly part of our daily lives, from driverless cars in urban navigation to household robots performing domestic chores. Since these systems operate closely alongside humans, learning from human feedback is a natural way to ensure their behaviors align with human desires.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2511.08502

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Motorsports > Formula One (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.54)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
(3 more...)

Add feedback

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics

Guo, Bowei, Tang, Shengkun, Zeng, Cong, Shen, Zhiqiang

arXiv.org Artificial IntelligenceOct-15-2025

Diffusion models are renowned for their generative capabilities, yet their pretraining processes exhibit distinct phases of learning speed that have been entirely overlooked in prior post-training acceleration efforts in the community. In this study, we introduce a novel framework called MosaicDiff that aligns diffusion pretraining dynamics with post-training sampling acceleration via trajectory-aware structural pruning. Our approach leverages the observation that the middle, fast-learning stage of diffusion pretraining requires more conservative pruning to preserve critical model features, while the early and later, slow-learning stages benefit from a more aggressive pruning strategy. This adaptive pruning mechanism is the first to explicitly mirror the inherent learning speed variations of diffusion pretraining, thereby harmonizing the model's inner training dynamics with its accelerated sampling process. Extensive experiments on DiT and SDXL demonstrate that our method achieves significant speed-ups in sampling without compromising output quality, outperforming previous state-of-the-art methods by large margins, also providing a new viewpoint for more efficient and robust training-free diffusion acceleration.

artificial intelligence, diffusion model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2510.11962

Country: Europe > Switzerland (0.28)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

35c1d69d23bb5dd6b9abcd68be005d5c-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 10:46:22 GMT

pre-trained model, pruning, training step, (14 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Case for Instance-Optimized LLMs in OLAP Databases

Mohammadi, Bardia, Bindschaedler, Laurent

arXiv.org Artificial IntelligenceJul-8-2025

Large Language Models (LLMs) can enhance analytics systems with powerful data summarization, cleaning, and semantic transformation capabilities. However, deploying LLMs at scale -- processing millions to billions of rows -- remains prohibitively expensive in computation and memory. We present IOLM-DB, a novel system that makes LLM-enhanced database queries practical through query-specific model optimization. Instead of using general-purpose LLMs, IOLM-DB generates lightweight, specialized models tailored to each query's specific needs using representative data samples. IOLM-DB reduces model footprints by up to 76% and increases throughput by up to 3.31$\times$ while maintaining accuracy through aggressive compression techniques, including quantization, sparsification, and structural pruning. We further show how our approach enables higher parallelism on existing hardware and seamlessly supports caching and batching strategies to reduce overheads. Our prototype demonstrates that leveraging LLM queries inside analytics systems is feasible at scale, opening new possibilities for future OLAP applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.04967

Country: Europe (0.93)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback